Search | VHL Regional Portal

Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods.

Kenny, Christopher T; McCartan, Cory; Kuriwaki, Shiro; Simko, Tyler; Imai, Kosuke.

Sci Adv ; 10(18): eadl2524, 2024 May 03.

Article in English | MEDLINE | ID: mdl-38691613

ABSTRACT

The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct an independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm used for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measurement File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful without measurement error modeling, especially for Hispanic and multiracial populations. TopDown's postprocessing reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.

Subject(s)

Algorithms , Bias , Censuses , United States , Humans , Privacy

Widespread partisan gerrymandering mostly cancels nationally, but reduces electoral competition.

Kenny, Christopher T; McCartan, Cory; Simko, Tyler; Kuriwaki, Shiro; Imai, Kosuke.

Proc Natl Acad Sci U S A ; 120(25): e2217322120, 2023 Jun 20.

Article in English | MEDLINE | ID: mdl-37310996

ABSTRACT

Congressional district lines in many US states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the US House under the enacted plan to those under a set of alternative simulated plans that serve as a nonpartisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the US House less responsive to shifts in the national vote.

Simulated redistricting plans for the analysis and evaluation of redistricting in the United States.

McCartan, Cory; Kenny, Christopher T; Simko, Tyler; Garcia, George; Wang, Kevin; Wu, Melissa; Kuriwaki, Shiro; Imai, Kosuke.

Sci Data ; 9(1): 689, 2022 11 11.

Article in English | MEDLINE | ID: mdl-36369510

ABSTRACT

This article introduces the 50STATESIMULATIONS, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50STATESIMULATIONS allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50STATESIMULATIONS include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.

Unrepresentative big surveys significantly overestimated US vaccine uptake.

Bradley, Valerie C; Kuriwaki, Shiro; Isakov, Michael; Sejdinovic, Dino; Meng, Xiao-Li; Flaxman, Seth.

Nature ; 600(7890): 695-700, 2021 12.

Article in English | MEDLINE | ID: mdl-34880504

ABSTRACT

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

Subject(s)

COVID-19 Vaccines/administration & dosage , Health Care Surveys , Vaccination/statistics & numerical data , Benchmarking , Bias , Big Data , COVID-19/epidemiology , COVID-19/prevention & control , Centers for Disease Control and Prevention, U.S. , Datasets as Topic/standards , Female , Health Care Surveys/standards , Humans , Male , Research Design , Sample Size , Social Media , United States/epidemiology , Vaccination Hesitancy/statistics & numerical data

The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census.

Kenny, Christopher T; Kuriwaki, Shiro; McCartan, Cory; Rosenman, Evan T R; Simko, Tyler; Imai, Kosuke.

Sci Adv ; 7(41): eabk3283, 2021 Oct 08.

Article in English | MEDLINE | ID: mdl-34613778

ABSTRACT

Census statistics play a key role in public policy decisions and social science research. However, given the risk of revealing individual information, many statistical agencies are considering disclosure control methods based on differential privacy, which add noise to tabulated data. Unlike other applications of differential privacy, however, census statistics must be postprocessed after noise injection to be usable. We study the impact of the U.S. Census Bureau's latest disclosure avoidance system (DAS) on a major application of census statistics, the redrawing of electoral districts. We find that the DAS systematically undercounts the population in mixed-race and mixed-partisan precincts, yielding unpredictable racial and partisan biases. While the DAS leads to a likely violation of the "One Person, One Vote" standard as currently interpreted, it does not prevent accurate predictions of an individual's race and ethnicity. Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL